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(54) Two-component bandwidth scheduler having application in multi-class digital 
communication systems 



(57) The method for servicing queues (14) holding 
data (12) packets for subsequent transmission to a 
communications link (16) processing comprises the 
steps of servicing each queue by forwarding its data 
packets to the link at time intervals corresponding to a 
guaranteed service rate of the queue, provided the 
queue is non-empty; and, during time intervals when 
none of the queues have packets being forwarded to the 
link in conformance with the above step, servicing the 
queues in accordance with a proportion of idle band- 
width allocated to each queue. The method is preferably 
carried out by a hierarchical scheduler (10) comprising 
an exhaustive scheduler (30) servicing a plurality of 
lower level schedulers in accordance with non-equal pri- 
ority levels assigned thereto; a non-work conserving 



shaper scheduler (20) feeding the exhaustive sched- 
uler; and a work conserving idle bandwidth scheduler 
(25) feeding the exhaustive scheduler (30). Each queue 
(14) concurrently contends for service from the shaper 
scheduler (20) and the idle bandwidth schedulers (25). 
The shaper scheduler (20) servicing the queue has a 
higher priority level with respect to the exhaustive 
scheduler (30) than the idle bandwidth scheduler (25) 
servicing the same queue. The technique distributes 
the idle bandwidth of the communications link (16) in a 
way which is de-coupled from the guaranteed service 
rates of the queues, thereby providing a more efficient 
distribution of the total available bandwidth. 
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Descrlpti n 

Field of Invention 

5 [0001] The invention generally relates to the art of scheduling systems wherein messages associated with plural proc- 
esses are stored in a number of queues for subsequent processing by a single resource having limited processing 
capability. The invention has particular application to the field of digital communications systems and in this aspect 
relates to a scheduler and related method for efficiently allocating the bandwidth of a communications link amongst mul- 
tiple queues which may be associated with a variety of service classes. 

10 

Background of Invention 

[0002] In various types of communication systems, including Asynchronous Transfer Mode (ATM) systems, situations 
often arise where a number of connections vie for the bandwidth of a communications link in a communication device, 
is such as at a network node. When such a situation arises, it is necessary to queue or buffer data packets or cells from 
the contending connections, and the queues must be serviced in some "fair way in order to ensure that all of the con- 
nections are adequately serviced. 

[0003] A similar situation arises in the more general case where plural processes contend for a single resource. For 
instance, a distributed processing system may comprise a number of local controllers, responsible for various facets of 

20 the system, which are connected to a central controller, responsible for the overall management of the system. The 
local controllers communicate with the central controller by sending it messages, which the central controller must proc- 
ess, i.e., act upon. In this sense, the local controllers present "jobs" to the central controller. At any instant of time, some 
of the local controllers will not be busy, having no messages which must be processed by the central controller. Con- 
currently, some of the local controllers will be busy, presenting multiple messages, and hence potential jobs, to the cen- 

25 tral controller. Since the central controller may be busy with other jobs, it stores the messages in various queues, e.g., 
according to the type or class of local controller from which the message originated, until such time the central controller 
can process the message and carry out the associated job. These messages must also be serviced in some fair way to 
ensure that all of the local controllers are adequately handled. It will be seen from the foregoing that the messages or 
jobs correspond to data packets of the digital communication system, and the fixed processing power or speed of the 

30 central controller corresponds to the bandwidth of the communications link. 

[0004] A common "fair scheduling scheme is proportional weighted fair queuing (hereinafter "proportional WFQ") 
wherein each queue, corresponding to each connection, is assigned a weight proportional to its allocated service rate. 
The proportional WFQ scheduler uses this weight to determine the amount of service given to the queue such that the 
scheduler is able to provide the allocated service rate for a given connection over a reasonably long busy period (i.e., 

35 when its queue is continuously non-empty), provided that the scheduler is not over-booked. The notion of an allocated 
service rate suits ATM systems in particular because almost all of the five currently defined ATM service classes rely 
on rate as a basis for defining quality of service (QoS). For instance, constant bit rate (CBR) connections are guaran- 
teed a cell loss ratio (CLR) and delay for ceils that conform to the peak cell rate (PCR). Variable bit rate (VBR) connec- 
tions, real-time and non-real-time, are also guaranteed a CLR and delay for cells that conform to the sustained cell rate 

40 (SCR) and PCR. An available bit rate (ABR) connection is given a variable service rate that is between a minimum cell 
rate (MCR) and PCR. Unspecified bit rate (UBR) connections are associated with PCRs, and are soon anticipated to 
also be associated with MCRs. 

[0005] In addition to the allocated service rate, because a proportional WFQ scheduler is work conserving, each non- 
empty queue will also receive a certain amount of instantaneous idle bandwidth. This is the extra service bandwidth that 

45 a queue receives due to (1) any unallocated bandwidth of a communications link, and (2) any allocated but currently 
unused bandwidth arising from the idle, non-busy periods of the other queues at the contention point 
[0006] To explain this in greater detail, suppose that queue n is given a weight ^ which is proportional to the allocated 
service rate queue n should receive. The proportional WFQ scheduler thus distributes the total allocated bandwidth of 
the communication link amongst all the queues in proportion to their allocated service rates. Consequently, the idle 

so bandwidth of the link is also distributed in proportion to the allocated service rates of all the non-empty queues. An 
example of this is shown in Figure 1(a) where four queues 14, corresponding to four connections A, B, C & D, are serv- 
iced by a proportional WFQ multiplexer 8 in order to produce an output cell stream or link 16. Connections A, B & C 
have allocated service rates equal to 30% of the total bandwidth associated with the link 16 and are thus equally 
weighted. The allocated service rate of connection D is equal to 10% of the total bandwidth of link 16. Figure 1(b) is a 

55 bandwidth occupancy chart illustrating how the link bandwidth is allocated to the connections. From time t = 0 to 8, each 
of the connections has ceils requiring s rvicing and thus the instantaneous bandwidth received by each connection is 
25% of the total bandwidth. At time t = 8, however, only connections B and D are non-empty having cells to be serviced, 
and thus the instantaneous idle bandwidth (now being 50% of the total bandwidth) is allocated to connections B & D in 
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proportion to their allocated service rates. Thus, at time t = 8, connection B receives 75% of the instantaneous total 
bandwidth and connection D receives 25% of the instantaneous total bandwidth. In general, the theoretical Instantane- 
ous service that queue n receives at time t when it is non-empty is $JZimA(t)§i where A(f) is the index set of non-empty 
queues at time f. 

s [0007] Suppose then that a proportional WFQ scheduler is used in an ATM communications device, such as a net- 
work node. A CBR connection should have an allocated service rate equal to its PCR. A VBR connection should have 
an allocated service rate, VBW (virtual bandwidth), which is at least equal to its SCR and less than its PCR. (VBW is 
typically statistically calculated at set up by the connection and admission control (CAC) function of a network.) An ABR 
connection should have an allocated service rate equal to its SCR, and a UBR connection should have an allocated 

10 service rate equal to zero. So, in such an scenario, the amount of idle bandwidth that a CBR connection sees Is propor- 
tional to its PCR, and that an ABR connection sees is proportional to its MCR. This may result in very undesirable serv- 
ice. For example, suppose that a switch is carrying four connections (only): one is CBR with PCR = 980 kbps, two 
connections are ABR with MCR = 10 kbps, and one is UBR. Consequently, the idle bandwidth distribution is 98% for 
the CBR connection and 1 % for each of the ABR connections, assuming a period when all the connections are busy. 

is Such a distribution is certainly not desirable, since CBR connections should generally not receive service bandwidth 
beyond their PCRs. ABR connections would get extra bandwidth in proportion to their MCRs; a phenomenon commonly 
termed MCR proportional service. MCR proportional service is one way of fairly distributing idle bandwidth fairly, but the 
literature has other methods such as MCR plus fair share which proportional WFQ cannot support And the UBR con- 
nection only gets service if all the other queues are empty. Such distributions of the idle bandwidth are not optimal, and 

20 hence it is desired to achieve a more efficient distribution of the idle bandwidth. 

Summary of Invention 

[0008] Generally speaking, the invention provides a method for servicing a plurality of queues holding messages, 
25 such as data packets, destined for processing by a resource having a finite processing bandwidth, such as a commu- 
nications link having a finite transmission bandwidth. The method comprises the steps of: (a) provisioning each queue 
with a minimum guaranteed service rate; (b) provisioning each queue with an idle bandwidth proportion; (c) servicing 
each queue by forwarding messages thereof to the resource at time intervals corresponding to the minimum guaran- 
teed service rate of the queue, provided the queue is non-empty; and (d) servicing the queues in accordance with the 
30 proportion of idle bandwidth allocated to each queue during time intervals when none of the queues have packets being 
forwarded to the resource in conformance with step (c). In this manner, the amount of instantaneous idle bandwidth that 
a queue receives is decoupled from the allocated service rate granted to the queua 

[0009] In the preferred embodiment the above method is carried out by a hierarchical scheduler which comprises (a) 
an exhaustive scheduler servicing a plurality of lower level schedulers in accordance with non-equal priority levels 
55 assigned thereto; (b) a non-work conserving shaper scheduler feeding the exhaustive scheduler; and (c) a work con- 
serving idle bandwidth scheduler feeding the exhaustive scheduler. The exhaustive scheduler is configured so that th 
shaper scheduler is given exhaustive priority over the idle bandwidth scheduler. The hierarchical scheduler is coupled 
to the queues such that each queue concurrently contends for service from the shaper scheduler and from the idle 
bandwidth scheduler. 

40 [0010] The non-work conserving shaper scheduler, such as a virtual clock shaper described below, generates a 
stream of data packets at a constant average bit rate. Since the shaper scheduler servicing a particular queue (which 
may correspond to one connection) has a higher priority than the work conserving idle bandwidth scheduler, such as a 
WFQ scheduler, the queue is guaranteed its allocated service rate during its busy period. However, the shaper sched- 
uler does not always submit messages (or in the preferred embodiment, the identify of queues) to the exhaustive sched- 

45 uler because not all queues are busy at all times, and even if a queue is busy, it may not be eligible to be serviced due 
to the non-work conserving nature of shaping. These periods constitute the idle bandwidth of the resource. During this 
"idle" time, the lower priority work conserving idle bandwidth scheduler servicing the queue is able to feed the exhaus- 
tive scheduler. The idle bandwidth scheduler distributes this idle bandwidth in manner which is preferably non-depend- 
ent upon the guaranteed service rates allocated to the queues. In the preferred embodiment, the idle bandwidth 

so scheduler partitions the instantaneous idle bandwidth in a fixed manner or ratio between QoS classes, and equally 
between all connections associated with a particular QoS class. 

[0011] In various embodiments described herein, the shaper scheduler and the idle bandwidth scheduler are also 
each preferably composed of a plurality of subschedulers in order to more flexibly accommodate the distribution of idle 
bandwidth in an ATM application environment as explained in greater detail below. 
65 [0012] According to another broad aspect of the invention, there is provided a hierarchical scheduler for servicing a 
plurality of queues holding messages. This scheduler comprises an exhaustive sub-scheduler servicing a plurality of 
lower level sub-schedulers in accordance with non-equal priority levels assigned thereto; M non-work conserving 
shaper sub-schedulers feeding the exhaustive sub-scheduler; and N work conserving idle bandwidth sub-schedulers 
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feeding the exhaustive sub-scheduler. A given queue concurrently contends for service from one of the shaper sub- 
schedulers and from ne of the idle bandwidth sub-schedulers, and the shaper subscheduler servicing the given queue 
has a higher priority level with respect to the exhaustive sub-scheduler than the idle bandwidth sub-scheduler servicing 
the given queue. 

5 

Brief Description of Drawings 

[0013] The foregoing and other aspects of the invention will become more apparent from the following description of 
the preferred embodiments thereof and the accompanying drawings which illustrate, by way of example only, the prirv 
10 dples of the invention. In the drawings; 

Figure 1(a) is a diagram exemplifying the queue arbitration problem that a prior art proportional WFQ scheduler or 
multiplexer has to manage; 

is Figure 1(b) is a bandwidth occupancy chart showing the how the link bandwidth is allocated to the queues overtime 
by the prior art proportional WFQ scheduler under the conditions shown in Figure 1 (a); 

Figure 2 is a functional block diagram illustrating a hierarchical scheduler In accordance with a first preferred 
embodiment of the invention; 

20 

Figure 3 is a functional block diagram illustrating a hierarchical scheduler in accordance with a second preferred 
embodiment of the invention; 

Figure 4 is a flowchart illustrating, at a high level, a method according to the first and second preferred embodi- 
es merits for implementing the hierarchical scheduler shown in Figures 2 and 3; 

Figure 5 Is a flowchart illustrating a packet pre-processing stage of the flowchart of Figure 4 in greater detail; 

Figure 6 is a flowchart illustrating an output processing stage of the flowchart of Figure 4 in greater detail in acoord- 
30 ance with the first preferred embodiment; 

Figure 7 is a flowchart illustrating an embellishment to the output processing stage shown in the flowchart of Figure 
6 in accordance with the second preferred embodiment; 

35 Figure 8 is a functional block diagram illustrating a hierarchical scheduler in accordance with a third preferred 
embodiment of the invention; and 

Figure 9 is a functional block diagram illustrating a hierarchical scheduler in accordance with a fourth preferred 
embodiment of the invention. 

40 

Detailed Description of Preferred Embodiments 

[001 4] Figure 2 is a functional block diagram illustrating a hierarchical scheduler 1 0 in accordance with a first preferred 
embodiment As described above, the task of scheduler 10 is to schedule data packets 12 stored in a plurality of input 

45 queues 14 to a limited resource, such as output communications link 16, which has a fixed bandwidth or service rate 
associated therewith. In the preferred embodiments, the data packets 12 are fixed length ATM or ATM-like cells. Fur- 
thermore, scheduler 10 is synchronous in that the data packets 12 are dequeued and transmitted to the communica- 
tions link 16 at a rate corresponding to the fixed link bandwidth. In other words, the communications link 16 can be 
viewed as being logically divided into time slots, such that scheduler 10 de-queues one data packet 12 per time slot 

so [0015] According to the first preferred embodiment, scheduler 10 comprises three "sub-schedulers" 20, 25 and 30, 
i.e., substantially independent schedulers, which are interconnected in a two-level, hierarchical, arrangement (For 
notational purposes, sub-schedulers 20, 25 and 30 will be referred to merely as "schedulers" since these schedulers 
may themselves be composed of sub-schedulers, as shown, for example, in Figure 3.) An exhaustive scheduler 30 is 
disposed at the top or primary level which provides the output to the communications link 16. The primary exhaustive 

55 scheduler, which is work conserving, serves a secondary non-work conserving shaper scheduler 20, described in 
greater detail below, and a secondary work conserving idle bandwidth scheduler 25 (such as a WFQ scheduler) also 
described in greater detail below. The two secondary schedulers 20 and 25 concurrently service the queues 14. The 
exhaustive scheduler 30, also known in the art as a static or strict priority scheduler, services processes in accordance 
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with priority levels assigned thereto. Thus, at any given time, the primary exhaustive scheduler 30 services the second- 
ary scheduler with the highest priority level, provided the latter is "busy", i.e., requires servicing. In the preferred embod- 
iment, the secondary shaper scheduler 20 is assigned a higher priority level than the secondary idle bandwidth 
scheduler 25, thereby nsuring that the former will always be serviced ahead of the latter when busy. 

5 [0016] The secondary schedulers 20 and 25 simultaneously "serve" the set of queues 14 by examining the queues 
and selecting one of them in order to dequeue a data packet therefrom. When the secondary schedulers 20 and 25 
have selected a queue to serve, they do not remove data packets from the queues; instead queue identifiers are sub- 
mitted to the primary scheduler 30. Thus, at each time slot the secondary schedulers 25 and 30 simultaneously and 
independently submit the Identifier of an eligible queue, if any, to the primary scheduler 30. Once the secondary sched- 

10 ulers 20 and 25 have submitted the queue identifiers, the primary scheduler 30 then serves the highest priority second- 
ary scheduler which submitted an eligible queue by dequeuing the head-of-line (HOL) data packet from the queue 
identified by that secondary scheduler. 

[0017] The secondary shaper scheduler 25 is preferably implemented as a virtual clock shaper similar to that 
described in Stiliadios, D. and Varma, A., "A General Methodology for Designing Efficient Traffic Scheduling and Shap- 

is Ing Algorithms", Proceedings ofl.E.E.E.!NFOCOM t Japan. 1997 (hereinafter "Stiliadios"), which is incorporated herein 
by reference. Each queue is provisioned or associated with an minimum guaranteed service rate, and the secondary 
shaper scheduler 25 provides a constant flow of data packets from each queue at its guaranteed service rate. Thus, if 
each queue 14 corresponds to a virtual connection, then each connection is provided with a constant average bit rate 
stream. The secondary shaper scheduler 25 is non-work conserving, and thus does not necessarily select a queue to 

20 be served every time slot 

[0018] The secondary idle bandwidth scheduler 25 is preferably implemented as a WFQ scheduler wherein each 
queue 14 is preferably provisioned or associated with a fixed WFQ weight corresponding to a predetermined allocation 
of the instantaneous idle bandwidth of the communications link 16. Thus, for instance, each queue could be assigned 
with a WFQ weight of 1/Af, where N is the total number of queues at any time. The secondary idle bandwidth scheduler 

26 25 is work conserving, such that it submits an eligible queue to the primary scheduler 30 each and every time slot, pro- 
vided that at least one queue has a packet stored therein. 

[0019] It will be re-called that both secondary schedulers 20 and 25 simultaneously serve the queues 14. Since the 
shaper scheduler 20 has a higher priority than the work conserving scheduler 25, the queues are guaranteed their allo- 
cated service rates during their busy periods. However, at various times the shaper scheduler 20 does not always sub- 
so mit queues to the exhaustive scheduler 30 because 1) the shaper scheduler 20 is non-work conserving such that no 
queues require a packet to be submitted during a particular time period in order to maintain their minimum guaranteed 
service or shaper rates, or 2) all of the queues having non-zero shaper rates are empty during a particular time period 
(although queues having zero shaper rates may be busy). These time periods constitute the idle bandwidth of the com- 
munications link 1 6. During this "idle" time, the lower priority work conserving idle bandwidth scheduler 25 which always 
36 (provided not ail of the queues are empty) selects an eligible queue every time slot is able to feed the primary scheduler 
30, and hence distribute the idle bandwidth amongst the queues in accordance with the particular scheme provided by 
scheduler 25. in this way, the amount of instantaneous idle bandwidth a queue receives is decoupled from or independ- 
ent of the guaranteed service rate assigned to it 

[0020] Other types of work conserving sub-schedulers may alternatively be used to allocate the idle bandwidth of a 
40 resource, such as communications link 16. However, generally speaking, since each queue contends for service from 
a shaper scheduler and from a work-conserving scheduler, wherein the former is granted exhaustive priority over the 
latter, then the work-conserving scheduler allocates a portion, or proportion, of the instantaneous idle bandwidth of the 
resource (hereinafter "idle bandwidth proportion") to the queue. 

[0021] A preferred configuration for scheduler 10 for handling various ATM traffic service classes is shown in Table 1 
45 below. In the preferred configuration, the idle bandwidth is "hard-partitioned" between the various ATM service classes, 
and each queue is associated with one virtual connection, which in turn is associated with one of the ATM traffic 
classes. The idle bandwidth proportion that the rtVBR.2/3, nrtVBR, ABR and UBR service classes receive are p 1( p 2 . 
Pa. and p 4 , respectively, where Pi + p 2 + P3 + P4 = 1 . The idle bandwidth allocated to each service class is divided 
equally amongst connections belonging to the same class. Alternatively, the idle bandwidth allocated to each service 
so class could be divided amongst connections of the same class in proportion to their guaranteed minimum service rates. 
Other schemes are also possible. 



55 



5 



EP 0 981 228 A2 



Table 1 



ATM Service Class 


Guaranteed Minimum 
Service Rate per queue 
(shaper rate) 


Idle Bandwidth Propor- 
tion per queue (WFQ 
weight) 


CBR 


PCR 


oo 


rtVBR.1 


VBW 


eo 


rtVBR.2/3 


VBW 


Pl W rtVBR^/3 


nrtVBR 


VBW 


P^^rrtVBR 


ABR 


MCR 


P^ABR 


UBR 


MCR 





[0022] The CBR and rtVBR.1 queues are assigned a weight of ~ to ensure that CBR and rtVBR.1 traffic receive pri- 
ority service over non-real-time traffic and rtVBR.2/3 traffic. In practice, such sources we typically subject to usage 
* 20 parameter control (UPC), i.e., a type of policing function to ensure that a connection is abiding with its traffic contract, 
and thus tend not to be bursty (i.e., conforming to their CDVTs.) Alternatively, other means may be provided in a com- 
munications system for shaping such sources to ensure conformance to the traffic contract See for example co-pend- 
ing Canadian patent application no. 2,229,577, filed February 12, 1998, and assigned to the instant assignee. As such, 
CBR and nrtVBR.1 connections will generally rarely utilize the instantaneous idle bandwidth managed by scheduler 25. 

25 [0023] The rtVBR.2/3 service class is treated as a non-real time class with respect to the distribution of idle bandwidth 
because of the nature of its traffic contract which only requires an ATM communications device, such a network node 
having connection admission control (CAC), to guarantee service to CLP 0 cells but not low priority CLP 1 cells. See, 
for instance, ATM FORUM doc. no. af-tm-0056.000, Traffic Management Specification, Version 4.0", April 1996, all of 
which is incorporated herein by reference. CLP 0 cells are subject to two bucket UPC (i.e., PCR conformance and SCR 

30 conformance) and thus typically tend to be well-behaved. However, CLP 1 cells are only subject to one bucket UPC (i.e., 
only PCR conformance) such that an aggregate CLP0+1 stream is only constrained by a PCR bucket Accordingly, it is 
possible for CLP 1 cells from numerous rtVBR.2/3 connections to arrive at the node in extended bursts. Ifthe rtVBR.2/3 
traffic class were granted an idle bandwidth proportion of ~ as the other real time traffic classes, then the CLP 1 cells 
could cause service starvation of the non-real time traffic such as UBR, thereby flooding the node. However, CLP 1 cells 

35 may in fact be discarded or alternatively delayed to the same (or worse) extent than that of the non-real time traffic. 
Thus, by assigning the rtVBR-2/3 traffic a finite idle bandwidth proportion similar to the non-real time traffic, this problem 
may be avoided. 

[0024] Figure 3 shows a hierarchical scheduler 50 according to a second preferred embodiment In this embodiment 
the secondary shaper scheduler 20 is composed of two shaper sub-schedulers 20A and 20B, and the secondary idle 
| 40 bandwidth scheduler 25 is composed of two work conserving sub-schedulers 25A and 25B. Queues 14A, which are 
associated with the CBR and rtVBR.1 ATM traffic classes, are concurrently serviced by shaper sub-scheduler 20A and 
work conserving sub-scheduler 25A Queues 14B, which are associated with the rtVBR.2/3 ATM traffic classes, are 
concurrently serviced by shaper sub-scheduler 20A and work conserving sub-scheduler 25B. Queues 14C, which are 
associated with the non-real time ATM traffic classes, are concurrently serviced by shaper sub-scheduler 20B and work 
45 conserving sub-scheduler 25B. The order of priority associated with the secondary sub-schedulers is 20A, 20B, 25A 
and 25B, from highest to lowest priority. 

[0025] A preferred configuration for scheduler 50 for handling the various ATM traffic service classes is shown in Table 
2 below. As before, each queue is associated with a single virtual connection and each queue/virtual connection is 
ensured its guaranteed minimum service rate, yet due to the priority of shaper subscheduler 20A over 20B, real time 

so traffic is scheduled ahead of non-real time traffic when shaping. As before, the critical real time traffic (CBR and 
rtVBR.1) classes also receive idle bandwidth in preference to the other traffic classes. However, the second embodi- 
ment provides a mechanism for better managing the distribution of idle bandwidth amongst the critical real-time traffic. 
This is because the idle bandwidth that the CBR and rtVBR service classes receive relative to one another is oh : c^. 
where a 1 + a 2 = 1 , and the allocated idle bandwidth per service class is divided equally amongst connections belong- 

55 ing to the same class. In this manner, the idle bandwidth proportion allocated to a CBR or rtVBR.1 queue can be well 
controlled, unlike the first embodiment while still giving these service classes preferential treatment In receiving idl 
bandwidth over the other service classes. 
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Table 2 



ATM Service Class 


Guaranteed Minimum 
Service Rate per queue 
(shaper rate) 


Idle Bandwidth Propor- 
tion per queue (WFQ 
weight) 


CBR 


^ PCR 


<*l' N CBH ) 


! rtVBR.1 


VBW 


O^^ffVBR.1 


| rtVBR.2/3 


VBW 


Pl'NrtVBR2/3 


nrtVBR 


VBW 


P^nrfVBR I 


ABR 


MCR 


Psf N ABR 


UBR 


MCR 


PV^UBR 



[0026] It will also be noted from the second embodiment that the rtVBR.2/3 queues are treated as a separate group, 
serviced by the high priority shaper subscheduler 20A along with the other real time classes, but serviced by the lowest 
20 priority idle bandwidth sub-scheduler 25B, similar to the non-real time traffic classes. This is done to prevent CLP 1 
flooding for the reasons stated above. 

[0027] Figure 4 is a flowchart illustrating, at a high level, a method according to the first and second preferred embod- 
iments for implementing hierarchical schedulers 10 and 50. The preferred method uses a form of time stamping, as 
described in greater detail below, for keeping track of when queues should be serviced. Thus, in the event 60 of the 
25 arrival of a data packet 12, an input processing stage 62 issues the time stamp, as required. In parallel, an output 
processing stage 64 dequeues data packets 12 from queues 14 for transmission to the communications link 16, as well 
as issues time stamps under certain circumstances. 

[0028] In the preferred method, each queue 14 is associated with two time stamps, called theoretical emission times 
(TETs). One of these time stamps (hereinafter "shaper TET 1 ) is used by the secondary shaper scheduler 20, and th 

30 other times stamp is used by the preferred secondary WFQ idle bandwidth scheduler 25 (hereinafter "WFQ PET"). Th 
preferred method differs from the virtual dock shaping technique described In Stifiadios, supra, in that in the prior art 
reference, each packet has a time stamp associated with it whereas in the preferred embodiments time stamping is per- 
formed at the level of each queue. The inventors have found that time stamping per queue is likely more economical to 
implement in practice because of lower memory storage requirements. 

35 [0029] Both time stamps of a queue change whenever a new packet reaches the head-of-line (HOL) position in the 
queue. This happens either when (a) a data packet arrives at an empty queue, or (b) a data packet has just been served 
and its queue has a following data packet waiting to be serviced which progresses to the HOL position. 
[0030] If a packet has just been served and its queue has no more packets waiting, its time stamps are no longer valid; 
i.e., the queue is no longer considered eligible for scheduling by the secondary shaper scheduler 20 and the preferred 

40 WFQ secondary idle bandwidth scheduler 25 (including any sub-schedulers thereof). 

[0031] According to the preferred method, when a packet arrives at an empty queue /, thereby placing the packet at 
the HOL position thereof, the shaper TET (i.e., for the purposes of the secondary shaper sub-schedulers) of a given 
queue / at a current time RTP (real time pointer) is: 

45 _ i 

7HT / = max{7ET /f /?7Py+£-, (1) 



where R/ is the shaper rate of queue /. 
so [0032] In the event a packet is dequeued in queue / such that another packet waiting in the queue reaches the HOL 
position, the shaper TET of queue / is: 

TET, = RTP+± (2) 



[0033] The shaper time stamp stays with th queue until it is served. At each time slot, the secondary shaper sched- 
uler 20 ( r shaper sub-schedulers 20A and 20B in the second embodiment) serves the queue with the smallest shaper 
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TET out of ail eligible queues. A queue / te eligible If 



TET r ±*RTR (3) 



[0034] Mathematically, the index, j, of the chosen queue is expressed as: 
j 



' = argjnrin |7E7; \TET i ~<. *7pJJ. (4) 



15 [0035] In the preferred embodiments, the WFQ sub-scheduler which functions as the secondary work conserving 
scheduler 25 (or sub-schedulers 25A and 25B thereof in the second embodiment) also time stamp each queue serv- 
iced. This can be accomplished with a self-clocked fair queuing scheme (SCFQ), described In greater detail, for 
instance, in Goyal et at., "Start-Time Fair Queuing: A Scheduling Algorithm for Integrated Services Packet Switching 
Networks", IEEE/ACM Trans. Networking, Vol. 5, No. 5, October 1970, and incorporated herein by reference (hereinaf- 

20 ter "Goyal"). With SCFQ, when a new packet arrives at the HOL position of a queue /, the WFQ TET is: 

TET i = max{ TET,, V7P}+^-, (5) 

25 

where VTP (virtual time pointer) is the WFQ TET value of the queue that was last served by the WFQ scheduler, and 
$/ is the WFQ weight of queue /. (Note that each WFQ scheduler has a VTP variable associated with it In contrast all 
shapers employ a common RTP variable since it is a measure of real time.) 

[0036] Alternatively, the time stamping for the preferred WFQ sub-scheduler may be carried out with a start-time fair 
30 queuing scheme (SFQ) described in greater detail in Goyal, supra. With SFQ, when a new packet arrives at the HOL 
position of queue /, the WFQ TET is: 

TET i = max { TET t + I, VTP}, (6) 



where VTP, again, is the WFQ TET value of the queue that was last served, and is the WFQ weight of queue /. 
[0037] Figure 5 is a flowchart illustrating the input processing stage 62 (of Fig. 4) in greater detail. An initial step 70 
enqueues the arriving packet in the appropriate queue 14. The incoming packets may be queued using a variety of 

40 queuing schemes, for instance, per priority queuing according to QoS or ATM traffic classes (such as shown in Fig. 3), 
per VC queuing, per port queuing, or combinations of the foregoing. Step 72 checks whether the packet enqueued in 
step 70 was placed at the head of the corresponding queue. If not the process terminates. Otherwise, step 74 calcu- 
lates the time stamp for the appropriate queue for subsequent use by the shaper scheduler 25 in accordance with equa- 
tion (1). Similarly, step 76 calculates the time stamp for the appropriate queue for subsequent use by the preferred WFQ 

45 scheduler in accordance with one of equations (5) and (6), as desired. 

[0038] Figure 6 is a flowchart illustrating the output processing stage 64 (of Fig. 4) in greater detail for one time slot 
In an initial step 80, the secondary shaper and idle bandwidth schedulers 20 and 25 (which are preferably implemented 
as separate independent computer processes or threads of execution) concurrently and independently select a queue, 
if any, for servicing in the current time slot In the case of the shaper scheduler 20, the queue selection is made in 

so accordance with equation (4). The idle bandwidth scheduler 20 selects the queue having the lowest valued TET. 

[0039] At step 82, each of the secondary shaper and idle bandwidth schedulers 20 and 25 concludes whether or not 
a packet should be dequeued. If so, then at step 84, the secondary schedulers 20 and 25 submit the identities, Qj and 
Ok, of the respectively selected queues to the exhaustive scheduler 30. 

[0040] At step 86, the primary exhaustive scheduler 30 checks whether or not the secondary shaper scheduler 20 
55 has submitted a queue for servicing. If so, then at step 90 the primary exhaustive scheduler 30 dequeues the HOL 
packet from queue Qj and forwards the packet to the communications link 16. At step 96, queue Qj (the queue just 
served) is examined to determine whether it is empty or not If it is not mpty, then a following packet within queue Qj 
becomes an HOL packet and thus at step 100 the shaper TET for queue Qj is updated in accordance with equation (2). 
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However If queue Qj is empty, then control passes to step 104. In this event an adjustment is made to the WFQ TET 
for queue Qj as follows: 

TEr Q, sTET Qj m ^; 



This adjustment is made because time stamping system operates on the prinicple of accumulated "credits", as will be 
understood to those skilled in the art, and thus since the secondary idle bandwidth scheduler 20 did not in fact service 
10 queue Qj. its credit in respect of the secondary scheduler 20 is preferably corrected before the next packet which will 
be an HOL packet arrives in queue Qj 

[0041] If at step 88 the secondary shaper scheduler 20 has not submitted a queue for servicing, then at step 88 the 
primary exhaustive scheduler 30 checks whether or not the secondary idle bandwidth scheduler 25 has submitted a 
queue for servicing. If so, then at step 92 the primary exhaustive scheduler 30 dequeues the HOL packet from queue 
is Ok and forwards the packet to communications link 16. At step 98, queue Ok (the queue just served) is examined to 
determine whether it is empty or not If it is not empty, then a following packet within queue Q K becomes an HOL packet 
and thus at step 102 the WFQ TET time stamp for queue Ok is updated in accordance with one of equations (5) and 
(6) as desired. However if queue Ok is empty, then control passes to step 106. In this event a credit adjustment for the 
reasons previously described is now made to the shaper TET as follows: 

20 

tet q ^tet Qk ^ ; 



25 [0042] Figure 7 shows the processing carried out by the primary exhaustive scheduler 30 specifically for the case of 
the second embodiment (Fig. 4). It will be noted that a query is made of each of the secondary sub-schedulers 20A, 
20B, 25A, and 25B, in that order, to determine whether or not they have submitted queues, and are serviced accordingly 
in a manner similar to that described above. 

[0043] Figure 8 shows a tertiary level hierarchical scheduler 110 according to a third preferred embodiment In this 
so embodiment the scheduler 110 services sixty-four inputs U - >64> each of which carries multiple connections from any 
of the various ATM traffic classes. The cells from each input are stored in a set of queues. Each set comprises at least 
one queue from each traffic class. Thus, for example, there are sixty-four CBR queues in total, one for each input and 
sixty-four nrtVBR queues in total, one for each input UBR traffic is divided Into two sections: "UBR m" refers to multicast 
UBR traffic; "UBR s" refers to single-cast UBR traffic. 
35 [0044] In this embodiment the shaper scheduler 20 is composed of two shaper sub-schedulers 20A and 20B which 
feed an exhaustive sub-scheduler 1 12 (which in turn feeds exhaustive scheduler 30). Shaper sub-scheduler 20A serv- 
ices the real time traffic classes, and shaper sub-scheduler 20B services the non-real time traffic classes (and 
rtVBR.2/3). Shaper sub-scheduler 20A has a higher priority with respect to exhaustive sub-scheduler 112 than shaper 
sub-scheduler 20B, In a manner similar to the second preferred embodiment shown in Figure 3. 
40 [0045] The idle bandwidth scheduler 25 is composed of five tertiary WFQ sub-schedulers 1 14A-E which feed a sec- 
ondary WFQ sub-scheduler 1 1 6. In this hierarchy, the weights of the secondary WFQ sub-scheduler 116 correspond to 
a portion of idle bandwidth allocated to an ATM traffic group as a whole, and the weights of a given tertiary WFQ sub- 
scheduler 114 divide up the portion of idle bandwidth allocated to the corresponding ATM traffic group amongst the 
queues of that group. 

45 [0046] A preferred configuration for scheduler 1 10 Is shown in Table 3, below. In the table, the notation "XrfVBR.2/3, 
queue VBW" means , for example, the sum of the VBWs of all connections sharing a particular rtVBR.2/3 queue, and 
"ZrfVBR^ra VBW" means the sum of the VBWs of all rtVBR.2/3 connections on all inputs, i.e., in the entire system. In 
addition, the notation ^yjBRjq^e * n <* "Nubr" respectively denote, for example, the number of UBR connections in a 
particular queue, and the total number of UBR connections on all inputs. 

so 
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Table 3 



ATM Service Class 


Guaranteed Minimum 
Service Rate per queue 
(shaper rate) 


Tertiary WFQ weight (per queue) 


Secondary WFQ Weight 
(per class) 


CBR 


EcBFCqueue PCR 


EcBR.queue PCR/IcBR PCR 


oo 


rtVBR.1 


ErfVBR.1, queue VBW 


2rfVBR.1,queue VBW/ZrfvBR^ VBW 


rtVBR.2/3 


^rtVBR^3 t queue VBW 


2rtVBR^3.queue VBW/ErfVBR^/3 VBW 


P1 


nrtVBR 


EfrtVBRqueue VBW 


EnrfVBR,queue VBW/InrfVBR VBW 


P2 


! ABR 


^ABR,queue MCR 


N ABR.queue ^ABR 


P3 


UBR 


2<JBR,aueue MCR 


NuBR,queue ^UBR 


PA 



[0047] As before, the ATM service classes are divided into three distinct groups as follows: (1) critical real time traffic 
(CBR and rtVBR.1 queues); (2) real-time traffic with CLP1 cells that can be dropped (rtVBR.2 and rtVBR.3), and (3) 
I 20 non-real time traffic (nrtVBR, UBR and ABR.) The first and second groups (queues associated with real time traffic) 
have exhaustive priority over the third group (queues associated with non-real time traffic) with respect to shaper sub- 
schedulers 20A and 20B. This ensures that the real-time queues get the bandwidth allocated to them by the CAC with 
minimal delay. 

[0048] With respect to the allocation of the idle bandwidth, the first group of critical real time traffic queues (whose 
25 connections have QoS guarantees for both CLP 0 and CLP 1 cells) receive exhaustive priority over the non-real time 
traffic since they are assigned a weight of «. The rtVBR.2/3 traffic class Is not included in this category as rtVBR.2/3 
traffic has CLP 1 cells which have no QoS guarantees. Instead, the idle bandwidth is distributed in a finite manner by 
the secondary WFQ sub-scheduler 1 16 by giving the aggregate rtVBR.2/3, nrtVBR, UBR and ABR traffic some prede- 
termined ratio, pi, of the total available idle bandwidth. Within the rtVBRJ2/3 and nrtVBR traffic classes, the idle band- 
30 width of the service class is divided amongst the queues in proportion to the guaranteed service rates of the 
connections carried by the queues. Within the ABR and CBR classes, the idle bandwidth allocated to each class is dis- 
tributed approximately equally amongst all connections in the class. Of course, it is also possible to achieve proportional 
distribution of the idle bandwidth by setting the weights of the tertiary sub-schedulers 1 14D and 1 14E for each queue 

t° ^-rulVBR, queue MCR/EnrtVBR MCR. 

35 [0048] The two-level hierarchy of scheduler 25 also provides a more flexible arrangement for distributing the idle band- 
width allotment within a given service class. For example, consider a scenario where there are five queues in total in 
the nrtVBR class, and the class Is allocated 20% of the idle bandwidth. If all five queues are active, then each queue 
receives 1 /5 th of 20% of the allocated idle bandwidth. However, if only one of the five queues is active, then due to the 
work conserving nature of WFQ, that queue would always be served by the appropriate tertiary WFQ sub-scheduler 
I 40 114 which, in turn, receives 20% of the idle bandwidth from the secondary WFQ sub-scheduler 1 1 6. Thus, the only busy 
queue in the class receives 20% of the idle bandwidth. In contrast, in the second embodiment shown in Figure 3, each 
queue has an idle bandwidth proportion of p/N, and thus under the presented scenario the busy queue would only 
receive approximately 175 th of 20% of the idle bandwidth (with the other 16% divided up amongst all busy schedulers). 
[0050] The preferred hierarchical schedulers described above have been configured relative to an ATM applications 

45 environment It will be understood that the hierarchical schedulers can be configured for other types applications. In the 
general case, the hierarchical schedulers may comprise M non-work conserving shaper sub-schedulers feeding an 
exhaustive sub-scheduler, and N work conserving idle bandwidth sub-schedulers feeding the exhaustive sub-sched- 
uler, provided that a queue is concurrently contending for service from one of the shaper sub-schedulers and from one 
of the idle bandwidth sub-schedulers, and the shaper sub-scheduler serving the queue has a higher exhaustive priority 

so level than the idle bandwidth sub-scheduler serving the same queue. Figure 9 shows an example of such a hierarchical 
scheduler where M = 2 and N =3. 

[0051] It should also be understood that the communications link 1 6 may represent a transmission media, or alterna- 
tively, an internal bus or pathway in a communications device. Similarly, it should be appreciated that the preferred 
embodiments may be applied in other types of applications where a number of processes generate various types of 
55 messages or jobs which are stored in a plurality queues for subsequent processing by a single resource having a finite 
processing capability or bandwidth. Moreover, while the preferred embodiments have made reference to synchronous 
sub-schedulers schedulers perating on fixed length data packets over equal time periods, in alternative embodiments 
of the invention the schedulers may be asynchronous in nature and operate n variable length data packets. Similarly, 
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those skilled In the art will appreciate that other modifications and variations may be made to the preferred embodi- 
ments disclosed herein without departing from the spirit of the invention. 

[0052] Where technical features mentioned in any claim are followed by reference signs, those reference signs have 
been included for the sole purpose of increasing the intelligibility of the claims and accordingly, such reference signs do 
5 not have any limiting effect on the scope of each element identified by way of example by such reference signs. 

Claims 

1 . A method for servicing a plurality of queues holding data packets destined for transmission over a communications 
10 link, said method comprising: 

(a) provisioning each queue with a minimum guaranteed service rate; 

(b) provisioning each queue with an idle bandwidth proportion; 

(c) servicing each queue by forwarding the data packets thereof to the communications link at time intervals 
15 approximately equivalent to the minimum guaranteed service rate of the queue, provided the queue is non- 
empty; and 

(d) during time intervals when none of the queues have messages being forwarded to the resource in conform- 
ance with step (c), servicing the queues in accordance with the proportion of idle bandwidth allocated to each 
queue. 

20 

2. A method according to Claim 1 , wherein in step 1(d) the queues are serviced using a work conserving, weighted 
fair queuing (WFQ) scheme, and the idle bandwidth proportion is a WFQ weight 

3. A method according to claim 2, wherein the idle bandwidth WFQ weight for each queue is one of: 

25 

(a) a predetermined fixed ratio relative to other queues; 

(b) proportional to the instantaneous length of the queue; 

(c) equal to fVN, where N is the number of queues in a common service class and P is a percentage of idle 
bandwidth allocated to that service class; and 

30 (d) proportional to its minimum guaranteed service rate relative to the guaranteed service rates of other 

queues within a common service class. 



4. A method according to Claim 2 or Claim 3 wherein the data packets are of fixed size and the communications link 
has a fixed bandwfth rate. 

35 

5. Apparatus for servicing a plurality of queues holding data packets destined for transmission over a communications 
link, said apparatus comprising: 

means for provisioning each queue with a minimum guaranteed service rate; 
40 means for provisioning each queue with an idle bandwidth proportion; 

first means for servicing each queue by forwarding the data packets thereof to the communications link at time 
intervals substantially corresponding to the minimum guaranteed service rate of the queue, provided the queue 
is non-empty; and 

second means for servicing the queues in accordance with the proportion of idle bandwidth allocated to each 
45 queue during time intervals when none of the queues have messages being forwarded to the resource by the 

first means for servicing the queues. 

6. Apparatus according to claim 5, wherein the first and second service means include a hierarchical scheduler com- 
prising: 

so 

an exhaustive priority scheduler for servicing a plurality of lower level schedulers In accordance with non-equal 
priority levels assigned thereto; 

a norvwork conserving shaper scheduler feeding the exhaustive scheduler, and 
a work conserving idle bandwidth scheduler feeding the exhaustive scheduler, 
55 wherein a given queue Is concurrently serviced by the shaper scheduler and the idle bandwidth scheduler, and 

wherein the shaper schedul r has a higher priority level with respect to the exhaustive sub-scheduler than the 
idle bandwidth scheduler. 
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7. Apparatus according to Claim 5, wherein the first and second service means include a hierarchical scheduler com- 
prising: 

an exhaustive scheduler servicing a plurality of lower level schedul rs in accordance with non-equal priority 
5 levels assigned thereto; 

M non-work conserving shaper schedulers feeding the exhaustive scheduler; and 
N work conserving idle bandwidth schedulers feeding the exhaustive scheduler; 

wherein a given queue is concurrently serviced by one of the shaper schedulers and one of the idle bandwidth 
schedulers, and wherein the shaper scheduler servicing the given queue has a higher priority level with respect 
w to the exhaustive scheduler than the idle bandwidth scheduler servicing the given queue. 



8. Apparatus according to Claim 6 or Claim 7, wherein a given shaper scheduler is itself hierarchical, comprising a 
multi-tiered arrangement of subschedulers, and a given idle bandwidth scheduler is Itself hierarchical, comprising 
a multi-tiered arrangement of sub-schedulers, and wherein a set of queues are concurrently examined by one of 

is the shaper sub-schedulers and one of the idle bandwidth sub-schedulers, and wherein the shaper sub-scheduler 
examining the set of queues has a higher priority level with respect to the exhaustive sub-scheduler than the idle 
bandwidth sub-scheduler examining the same set of queues. 

9. Apparatus according to any of Claims 6, 7 and 8, wherein the schedulers service their respective queues by corv 
I 20 eluding which queue should be serviced and passing a queue identifier to a higher level scheduler, provided it 

exists. 

10. Apparatus according to any of Claims 6-9, wherein the idle bandwidth scheduler is a weighted fair queue (WFQ) 
scheduler. 

25 

1 1 . Apparatus according to Claim 10, wherein the queues are time-stamped and the time stamp, TET, of a given queue 
/, in respect of a shaper scheduler is set to 

(a) 

30 

TET i ~ max{ TET lt RTPy*-^- t 



) 40 



45 



in the event an incoming data packet is placed at a head-oHine position in queue t t or 
(b) 

TET J = RTP+± 

in the event a data packet is dequeued placing another packet at a head-of-Jine position in queue /, where FTP 
corresponds to a current time and R/ is the minimum guaranteed service rate of queue /; and 
the shaper scheduler selects a queue, J, for service such that 



50 



12. Apparatus according to Claim 10, wherein the queues are time-stamped and the time stamp, TET, of a given queue 
55 /, in respect of a WFQ scheduler Is set to one of (a) 
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TETj = max{7ET /l V/7P}+j- t 



5 and 
(b) 

7ET, = max{7H7 / + ^-,V7P}, 

to 

in the event an incoming data packet is placed at a head-of -line position in queue /, where VTP is the TET value 
of a queue that was last served, and <>,, is a WFQ weight; and wherein the WFQ scheduler selects a queue, /, for 
service such that 

15 

7 = arg{nin(7E7;}}. 



20 



25 



40 



13. Apparatus according to Claims 1 1 or Claim 12, wherein: rt (a) in the event a packet is dequeued from a queue, Qj, 
selected by the shaper scheduler, the TET for queue Qj for the corresponding WFQ scheduler is adjusted by 

7HT Q = TET Q - r-i- ; 
J J *Qj 

and (b) in the event a message is dequeued from a queue, 0* selected by the WFQ scheduler, the TET for queue 
Qk for the corresponding shaper scheduler is adjusted by 

TET 0 = TET n - 7=^—. 

14. Apparatus according to any of Claims 10-13, wherein the packets are of fixed size and the first and second means 
for servicing the queues operate synchronously to deliver one packet from the queues to the communications link 
during one fixed period time slot 

15. Apparatus according to any of Claims 10-15, wherein the WFQ weight for each queue is one oh 



(a) a predetermined fixed ratio relative to other queues; 

(b) proportional to the instantaneous length of the queue; 

45 (c) equal to P/N, where N is the number of queues in a common service class and P is a percentage of idle 

bandwidth allocated to that service class; and 

(d) proportional to its minimum guaranteed service rate relative to the guaranteed service rates of other 
queues within a common service class. 



so 16. Apparatus according to Claim 7, wherein the communications link is an ATM link; M=2 and N=2; and wherein 
queues associated with CBR and rtVBR.1 ATM traffic classes are serviced by a first shaper scheduler and a first 
WFQ idle bandwidth scheduler, queues associated with rtVBR.2/3 ATM traffic classes are serviced by the first 
shaper scheduler and a second WFQ idle bandwidth scheduler, and queues associated with nrtVBR.2/3, ABR, and 
UBR ATM traffic classes are serviced by a second shaper scheduler and the second WFQ idle bandwidth sched- 

55 uler, and wherein the first shaper scheduler has exhaustive priority over the second shaper scheduler which has 
exhaustive pri rity over the first WFQ idle bandwidth scheduler which has exhaustive priority over the second WFQ 
idl bandwidth scheduler. 
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